Sorting ======= What's Sorting? Putting a sequence of items in order Why is Sorting Important? 1. People can process data better when it is sorted Words in a dictionary Index of a book Card catalog in a library 2. Some algorithms can be faster when input is sorted Binary search Finding duplicates How do you find duplicates in an unsorted array? e.g., 8 2 4 1 2 4 Quadratic time boolean duplicates (Object[] a) { for (int i = 0; i < a.length; i++) for (int j = i+1; j < a.length; j++) if (a[i].equals(a[j])) return true; return false; } How do you find duplicates in a sorted array? 1 2 2 4 4 8 Linear time Selection Sort -------------- How does Selection Sort work? Keep boundary between sorted and unsorted parts Sorted part starts empty Find smallest item in unsorted part Swap with first item in unsorted part Sorted part is now larger by 1 Show each step of Selection Sort on the array: 8 2 4 1 2 4 |8 2 4 1 2 4 1|2 4 8 2 4 1 2|4 8 2 4 1 2 2|8 4 4 1 2 2 4|8 4 1 2 2 4 4|8 void selectionSort (Comparable[] a) { for (int i = 0; i < a.length-1; i++) { int min = i; for (int j = i+1; j < a.length; j++) if (a[j].compareTo(a[min]) < 0) min = j; Comparable tmp = a[min]; a[min] = a[i]; a[i] = tmp; } } What kind of input is worst-case for Selection sort? Doesn't matter How many compares? (for Selection sort on any input) O(n^2) How many swaps? (for Selection sort on any input) O(n) (a) Sort.java (SelectionSort()) running time on sorted, reverse, random data Insertion Sort -------------- How does Insertion Sort work? Keep boundary between sorted and unsorted parts Sorted part starts with first item Take first item in unsorted part and insert into sorted part Sorted part is now larger by 1 Show each step of Insertion Sort on the array: 8 2 4 1 2 4 8|2 4 1 2 4 2 8|4 1 2 4 2 4 8|1 2 4 1 2 4 8|2 4 1 2 2 4 8|4 1 2 2 4 4 8 void insertionSort (Comparable[] a) { for (int p = 1; p < a.length; p++) { Comparable tmp = a[p]; int j; for (j = p; j > 0 && tmp.compareTo(a[j-1]) < 0; j--) a[j] = a[j-1]; a[j] = tmp; } } What kind of input is worst-case for insertion sort? Reverse sorted input How many compares? (for insertion sort on reverse-sorted input) O(n^2) How many swaps? (for insertion sort on reverse-sorted input) O(n^2) What kind of input is best-case for insertion sort? Sorted input How many compares? (for insertion sort on sorted input) O(n) How many swaps? (for insertion sort on sorted input) None How does Insertion Sort perform on random input? How many compares? (for insertion sort on random input) O(n^2) How many swaps? (for insertion sort on random input) O(n^2) (b) Sort.java (InsertionSort()) running time on sorted, reverse, random data Mergesort --------- How does Mergesort work? Divide and conquer If size of array is 0 or 1, return Recursively sort first and second halves Merge two sorted halves into one Give the array after the recursive calls sort each half: 8 2 4 1 2 4 2 4 8 1 2 4 How do you merge the sorted halves? Allocate a temporary array Repeatedly copy the smaller value from each half into the temp array When either half runs out, copy the rest of the other to temp array Show the steps of merging the two arrays 1 2 4 8 2 3 4 5 Show each step of Mergesort on the array 8 2 4 1 2 4 mergesort 8 2 4 mergesort 8 2 mergesort 8 return 8 mergesort 2 return 2 merge 8 and 2 return 2 8 mergesort 4 return 4 merge 2 8 and 4 return 2 4 8 mergesort 1 2 4 mergesort 1 2 mergesort 1 return 1 mergesort 2 return 2 merge 1 and 2 return 1 2 mergesort 4 return 4 merge 1 2 and 4 return 1 2 4 merge 2 4 8 and 1 2 4 return 1 2 2 4 4 8 void mergeSort( Comparable [ ] a ) { mergeSort( a, 0, a.length - 1 ); } void mergeSort( Comparable [ ] a, int left, int right ) { if( left < right ) { int center = ( left + right ) / 2; mergeSort( a, left, center ); mergeSort( a, center + 1, right ); merge( a, left, center, right ); } } void merge( Comparable [ ] a, int left, int center, int right ) { int size = right - left + 1; Comparable [ ] tmp = new Comparable[ size ]; int i = 0; int leftPos = left; int rightPos = center+1; while( leftPos <= center && rightPos <= right ) if( a[ leftPos ].compareTo( a[ rightPos ] ) <= 0 ) tmp[ i++ ] = a[ leftPos++ ]; else tmp[ i++ ] = a[ rightPos++ ]; while( leftPos <= center ) tmp[ i++ ] = a[ leftPos++ ]; while( rightPos <= right ) tmp[ i++ ] = a[ rightPos++ ]; for( i = 0; i < size; i++ ) a[ left+i ] = tmp[ i ]; } (d) Sort.java (MergeSort()) running time on sorted, reverse, random data What's a disadvantage of Mergesort? Uses twice the memory What kind of input is worst-case for Mergesort? Doesn't matter What's the Big-Oh bound for merge? O(n) What's the Big-Oh bound for Mergesort? O(n log n) Quicksort --------- How does Quicksort work? Divide and conquer If size of array is 0 or 1, return Select any item in array as the pivot Put values smaller than pivot into group L Put values larger than pivot into group R Recursively sort L and R Return L followed by pivot followed by R Fastest known comparison-based sorting algorithm Show each step of Quicksort on the array. 8 5 4 6 2 3 1 choose pivot 4 (a lucky choice) partition into 2 3 1 and 8 5 6 quicksort 2 3 1 choose pivot 2 partition into 1 and 3 quicksort 1 return 1 quicksort 3 return 3 return 1 2 3 quicksort 8 5 6 choose pivot 6 partition into 5 and 8 quicksort 5 return 5 quicksort 8 return 8 return 5 6 8 return 1 2 3 4 5 6 8 void quicksort( Comparable [ ] a ) { quicksort( a, 0, a.length - 1 ); } void quicksort( Comparable [ ] a, int low, int high ) { if( low < high ) { int pivot = partition(a, low, high); quicksort( a, low, pivot - 1 ); quicksort( a, pivot + 1, high ); } } int partition (Comparable[] a, int low, int high) { Comparable pivot = a[low]; int i = low; int j = high+1; while (i < j) { do i++; while(i < j && a[ i ].compareTo( pivot ) < 0 ); do j--; while(i <= j && a[ j ].compareTo( pivot ) > 0 ); if (i < j) { Comparable temp = a[i]; a[i] = a[j]; a[j] = temp; } } a[low] = a[j]; a[j] = pivot; return j; } Can you prove QuickSort works? Assume recursive call sorts the items smaller than the pivot Assume recursive call sorts the items larger than the pivot The pivot falls between the smaller and larger items (e) Sort.java (QuickSortBad()) running time on sorted, reverse, random data What's the major disadvantage of Mergesort? Does Quicksort have this problem? Mergesort requires an extra array Quicksort partitions in-place, doesn't need extra array Why is Quicksort faster than Mergesort? Partition faster than merge Less copying of data What's the Big-Oh bound for the partition step? O(n) The result of partitioning greatly affects Quicksort's running time Bubble Sort ----------- How does Bubble Sort work? Pass over array N times Each time bubble the largest value to the end Compare adjacent elements Swap adjacent elements if needed Show each step of bubble sort on the array: 8 2 4 1 2 4 8 2 4 1 2 4 2 4 1 2 4 8 2 1 2 4 4 8 1 2 2 4 4 8 8 4 4 2 2 1 8 4 4 2 2 1 4 4 2 2 1 8 4 2 2 1 4 8 2 2 1 4 4 8 2 1 2 4 4 8 1 2 2 4 4 8 void bubbleSort (Comparable[] a) { for (int i = 0; i < a.length; i++) { for (int j = 0; j < a.length-1; j++) { if (a[j].compareTo(a[j+1]) > 0) { Comparable tmp = a[j]; a[j] = a[j+1]; a[j+1] = tmp; } } } } What kind of input is worst-case for bubble sort? Reverse sorted input How many compares? (for bubble sort on reverse-sorted input) O(n^2) How many swaps? (for bubble sort on reverse-sorted input) O(n^2) What kind of input is best-case for bubble sort? Sorted input How many compares? (for bubble sort on sorted input) O(n^2) How many swaps? (for bubble sort on sorted input) none How does Bubble Sort perform on random input? How many compares? (for bubble sort on random input) O(n^2) How many swaps? (for bubble sort on random input) O(n^2) (f) Sort.java (BubbleSort()) running time on sorted, reverse, random data How can we improve the above naive implementation? Stop as soon as a pass over the data produces no swaps (g) Sort.java (BetterBubbleSort()) running time on sorted, reverse, random data Revisiting Mergesort: Complexity Analysis ----------------------------------------- What's the Big-Oh bound for Mergesort? O(n log n) How can you derive this bound? What's the running time equation for Mergesort? T(n) = 2*T(n/2) + a*n T(1) = b (Note: this is a recurrence relation) How do you solve this recurrence relation? Rewrite it without recursion T(1) = b T(2) = 2*T(1) + 2a = 2b + 2a T(4) = 2*T(2) + 4a = 4b + 8a T(8) = 2*T(4) + 8a = 8b + 24a ... T(n) = b*n + a*n log n Generalizing the result What's a Divide and Conquer Algorithm? Divide problem into smaller subproblems of the same type Solve the subproblems recursively Combine the subproblem solutions into a solution for whole problem If the problem is small enough, solve it directly What's the running time equation for divide and conquer algorithms? T(N) = A*T(N/B) + O(N^k) A is the number of subproblems at each stage B is the relative size of subproblems (e.g., B=2 for half-sized) k is the overhead O(N^k) What's the solution to the general running time equation? for A >= 1 and B > 1, T(N) = O(N^(log[B]A)) for A > B^k T(N) = O(N^k log N) for A = B^k T(N) = O(N^k) for A < B^k What are A, B, and k for Mergesort? A = 2, B = 2, k = 1 Show how the general solution gives O(N log N) for Mergesort. A = B^k ==> T(N) = O(N^1 log N) Revisiting Quicksort: Complexity Analysis ----------------------------------------- What's the Best-Case partitioning for Quicksort? Partition gives two equal-size sets Recursively solve two half-size problems Happens at every step What's the running time formula for Quicksort in the Best-Case? T(n) = a*n + 2*T(n/2) What's the Big-Oh for Quicksort in the Best-Case? Same formula as Mergesort O(N log N) What's the Worst-Case partitioning for Quicksort? Partition gives one empty partition, other size N-1 Recursively solve size N-1 problem Happens at every step What's the running time formula for Quicksort in the Worst-Case? T(n) = a*n + T(n-1) T(1) = b What's the Big-Oh for Quicksort in the Worst-Case? T(1) = b T(2) = 2a + b T(3) = 3a + 2a + b T(4) = 4a + 3a + 2a + b T(5) = 5a + 4a + 3a + 2a + b ... T(n) = a[n(n+1)/2 - 1] + b ==> O(N^2) What's the Average-Case partitioning for Quicksort? All possible partitions are equally likely at every step What's the running time formula for Quicksort in the Average-Case? A bit complicated (see textbook) What's the Big-Oh for Quicksort in the Average-Case? O(N log N) What if all items are equal to the pivot? We can make the indices i and j either stop or keep moving when they encounter an item that is equal to the pivot Both should do the same so as not to bias the partitioning Consider the case when all items are identical Case 1: i and j stop Swap at each position Pivot is placed in middle Partitions are of (nearly) equal sizes ==> Best-case applies: O(N log N) Case 2: i and j keep moving No swaps Pivot is first item Partitions are widely uneven (1 / N-1) ==> Worst-case applies: O(N^2) What determines whether Quicksort runs in O(N log N) or O(N^2) time? [The choice of the pivot value] What's the outcome of using the first item as Pivot? (Note: This is a popular choice) Quadratic time on sorted/reverse data What's the outcome of using the middle item as Pivot? Works well on sorted/reverse data A reasonable choice Some inputs can still result in quadratic time What's the median of a list of numbers? Median is middle number in a sorted list Why not select the median value from the array as Pivot? (Note: This would always give perfect partitioning) Takes too long to compute the median How can you estimate the median without spending too much time? Sampling (still expensive, size of sample) Median-of-three: use median of first, middle and last A closer look at partitioning How does the Quicksort partitioning algorithm work? 1. swap pivot with first item 2. repeat run i from left to right run j from right to left stop i on large item stop j on small item if i and j not crossed, swap 3. swap position j with pivot Median-of-three Partitioning put smallest of three in first position put largest of three in last position put median/pivot in middle 1. swap middle (pivot) with next-to-last item 2. scan from low+1 and high-2 first/last items act as sentinels 3. swap position i with next-to-last item Show each step of median-of-three partitioning on the array 8 2 4 1 2 4 Speeding Up Quicksort Further ----------------------------- Note that for small arrays/lists, Insertion Sort is faster than Quicksort How can we use this fact to improve the Quicksort implementation? Switch to insertion sort at size CUTOFF (here, 10) Also avoids degenerate cases Improved algorithm, with median-of-three and insertionSort /** * Quicksort algorithm. * @param a an array of Comparable items. */ public static void quicksort( Comparable [ ] a ) { quicksort( a, 0, a.length - 1 ); } private static final int CUTOFF = 10; /** * Method to swap two elements in an array. * @param a an array of objects. * @param index1 the index of the first object. * @param index2 the index of the second object. */ public static final void swapReferences( Object [ ] a, int index1, int index2 ) { Object tmp = a[ index1 ]; a[ index1 ] = a[ index2 ]; a[ index2 ] = tmp; } /** * Internal quicksort method that makes recursive calls. * Uses median-of-three partitioning and a cutoff of 10. * @param a an array of Comparable items. * @param low the left-most index of the subarray. * @param high the right-most index of the subarray. */ private static void quicksort( Comparable [ ] a, int low, int high ) { if( low + CUTOFF > high ) insertionSort( a, low, high ); else { // Sort low, middle, high int middle = ( low + high ) / 2; if( a[ middle ].compareTo( a[ low ] ) < 0 ) swapReferences( a, low, middle ); if( a[ high ].compareTo( a[ low ] ) < 0 ) swapReferences( a, low, high ); if( a[ high ].compareTo( a[ middle ] ) < 0 ) swapReferences( a, middle, high ); // Place pivot at position high - 1 swapReferences( a, middle, high - 1 ); Comparable pivot = a[ high - 1 ]; // Begin partitioning int i, j; for( i = low, j = high - 1; ; ) { while( a[ ++i ].compareTo( pivot ) < 0 ) ; while( pivot.compareTo( a[ --j ] ) < 0 ) ; if( i >= j ) break; swapReferences( a, i, j ); } // Restore pivot swapReferences( a, i, high - 1 ); quicksort( a, low, i - 1 ); // Sort small elements quicksort( a, i + 1, high ); // Sort large elements } } /** * Internal insertion sort routine for subarrays * that is used by quicksort. * @param a an array of Comparable items. * @param low the left-most index of the subarray. * @param n the number of items to sort. */ private static void insertionSort( Comparable [ ] a, int low, int high ) { for( int p = low + 1; p <= high; p++ ) { Comparable tmp = a[ p ]; int j; for( j = p; j > low && tmp.compareTo( a[ j - 1 ] ) < 0; j-- ) a[ j ] = a[ j - 1 ]; a[ j ] = tmp; } } h) Sort.java (QuickSort()) running time on sorted, reverse, random data Lower Bound for Sorting ----------------------- Is it possible to sort faster than O(N log N)? Yes, using array indexing But, only works for restricted types (integers) Is it possible for comparison-based algorithms? No Any comparison-based sort must use N log N compares Must choose between N! permutations Each compare step divides problem in half log N! steps to arrive at 1 permutation O(log N!) is the same BigOh as O(N log N) Selection --------- What's Selection? Finding the kth smallest item Special case is median (n/2th smallest item) How can we use sorting to solve the selection problem? 1. Sort 2. Return item at position k What's the Big-Oh bound for this solution? N log N Can we do better? Yes Quickselect How does Quickselect work? Almost like quicksort Pick a pivot Partition the items Recurse only on side where kth item is located What's the Big-Oh bound for Quickselect? Can be shown to be O(N) on average Of course, worst-case is like Quicksort: O(N^2)